-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JAMES-3142 eventsourcing for group unregistration #3280
Conversation
</dependency> | ||
<dependency> | ||
<groupId>${james.groupId}</groupId> | ||
<artifactId>event-sourcing-event-store-memory</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ordering issue
|
||
import reactor.core.publisher.Mono; | ||
|
||
public class GroupUnregistringManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about GroupRegistry
?
...n/java/org/apache/james/mailbox/events/eventsourcing/RegisteredGroupListenerChangeEvent.java
Outdated
Show resolved
Hide resolved
...q/src/main/java/org/apache/james/mailbox/events/eventsourcing/RegisteredGroupsAggregate.java
Outdated
Show resolved
Hide resolved
...event-rabbitmq/src/main/java/org/apache/james/mailbox/events/eventsourcing/StartCommand.java
Outdated
Show resolved
Hide resolved
|
||
public GroupUnregistringManager(EventStore eventStore, UnregisterRemovedGroupsSubscriber.Unregisterer unregisterer) { | ||
this.eventSourcingSystem = EventSourcingSystem.fromJava(ImmutableSet.of(new StartCommandHandler(eventStore)), | ||
ImmutableSet.of(new UnregisterRemovedGroupsSubscriber(unregisterer)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no delivery guarantee for event handlers, what will happen if we miss one event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no delivery guarantee for event handlers, what will happen if we miss one event?
The error scenario here is a server stop as our eventsourcing-eventbus is in memory based.
I guess in this very unlikely case a manual admin intervention to cliclk the "unbind" button would be acceptable. (I don't see how we could do better)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error scenario here is a server stop as our eventsourcing-eventbus is in memory based.
You should not try to describe all errors cases because:
- you'll always miss some
- the system this code depends on will change over time
Either we can enforce transactionality or we can't.
In this case, as we are using handlers, we are not in a transaction and we will miss an event sooner or later.
I can't think of a better solution than a handler but it means we have to figure out what happens in this failure case.
I guess in this very unlikely case a manual admin intervention to cliclk the "unbind" button would be acceptable. (I don't see how we could do better)
How he would know he need to do that? (sorry, I tried to find a solution but can't in a decent timeframe)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have an aggregate handling 3 state for a group:
- used
- used but still binded
- not used and unbinded
Then if the unRegisterSubscriber succeeds, it can fire a UnbindSucceededCommand on the aggregate.
That way we might attempt several time an unbind operation, without manual admin operation.
test this please |
test this please |
assertThat(unregisterer.unregisteredGroups()) | ||
.containsExactly(GROUP_A, GROUP_B); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the test :
testee.start(ImmutableSet.of(GROUP_A, GROUP_B)).block();
testee.start(ImmutableSet.of(GROUP_A)).block();
testee.start(ImmutableSet.of(GROUP_A, GROUP_B)).block();
???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GroupB would be unregistered once, and re-added in the aggregate.
I don't see the value of this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we going to detect a missing unbind?
What if we have a repeated task (I guess we don't have that feature in Task Manager yet) that list queues and send them to our aggregate to check if any action is required?
...q/src/main/java/org/apache/james/mailbox/events/eventsourcing/RegisteredGroupsAggregate.java
Outdated
Show resolved
Hide resolved
...q/src/main/java/org/apache/james/mailbox/events/eventsourcing/RegisteredGroupsAggregate.java
Outdated
Show resolved
Hide resolved
...q/src/main/java/org/apache/james/mailbox/events/eventsourcing/RegisteredGroupsAggregate.java
Outdated
Show resolved
Hide resolved
...q/src/main/java/org/apache/james/mailbox/events/eventsourcing/RegisteredGroupsAggregate.java
Show resolved
Hide resolved
Logging the error. Also that will eventually be corrected so what is the point of doing better?
Looks like a design involving a distributed scheduler. Not sure that is more reasonable than the proposed solution. Hence, for the time being, the two realistic ways of triggering the aggregate is either:
In any case we would have to prevent triggering the aggregate while groups are partially registered. |
...mq/src/main/java/org/apache/james/mailbox/events/eventsourcing/GroupUnregistringManager.java
Outdated
Show resolved
Hide resolved
@@ -129,6 +125,9 @@ private RegisteredGroupsAggregate(History history) { | |||
} | |||
|
|||
public List<UnbindSucceededEvent> handle(MarkUnbindAsSucceededCommand command) { | |||
Preconditions.checkArgument(state.bindedGroups().contains(command.getSucceededGroup()), | |||
"unbing a non binded group, or a used group"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/unbing/unbind
it's not "detection" is "sending a bottle in the sea" (:
That's the question: how?
Are we going to require James admin to setup a cron for every jobs we want to be scheduled? |
That's a productive remark. I don't have better ideas. I bet this can always come as future enhencements. We could for example think about a healthcheck loading the aggregate to ensure all the groups are unbinded. And have a webadmin endpoint for triggering the aggregate without requiring a restart. These are very valuable overall design remarks (thanks) however this have nothing to do with how the event sourcing stuff works. In short: let's discuss system design that don't have code impact in other channels. |
Again the way you call the aggregate is orthogonal to it's inner working. You are discussing things unrelated to my work here. I would have loved to have such discussion in a grooming meeting. So yes, let's have design discussion, let's improve our system design in a progressive fashion. Also, a distributed scheduler is as far as I understand a hundred man day work feature. Linking a bugfix to such a feature is the best way to nether get it fixed. |
test this please |
I understand very well your frustration: you wrote some nice code handling an immediate issue and you are not happy "loosing time" before having it merged and useful. I'm not against merging the first try at the feature as it will fix part of the problem and make people happier. However, I can't be blame for not having anticipated all the design decision during the grooming. And I think you can agree that the way we solve the problem here is a partial solution because in some cases, a queue won't be unbound (and the goal of this feature is to make sure it doesn't happen). So we can move the discussion elsewhere if you think it's better suited but let's not fool ourselves: this issue won't be "done" thanks to this PR. |
The JIRA ticket is the right place.
That's far from over: we need to initialize groups "at once" anyway |
Looking at the discussion, I think that the EventHandler doing the queue unbinding should not emit a command. It makes everything more complex and doesn't seem to solve any issue. The problem that's left is: the projection (in this case the group listener queues) can diverge from the source of truth. We can compute easily the expected list of group listeners that should exist so we can recompute the projection from the aggregate history. The only question left, in my opinion, is: how do we detect it diverged and when/how do we fix it? Do we agree? |
Without that you do not get eventual consistence as the aggregate can't know if the unbinding is done or not. That is the clever way I found for resolving your concern expressed here: #3280 (comment)
Adapting the aggregate should be easy then. I would agree with this even if we would end up with a pattern matching strategy to retrieve such a list.
A/ Upon start. B/ Or via a healthcheck. C/ Or via a webadmin end point D/ Or via a cron task within the TaskManager D/ -> not realistic. A/ is the moment the list of group change. It handles the happy scenario which is far better than the nothing is done today scenario. A reboot being needed, it's not really admin friendly upon error yet it allows an admin to still fix it. B/ would be a good diagnostic tool. C/ alone is not nice: can we really trust the admin to really remember he should call it after unconfiguring a group? So I believe A + B (diagnostic only) + C is the realistic midlle term solution. However the definition of done of our task, that is acceptable from an admin standpoint is A. |
See #3303 |
With it you don't have it neither (the command can fail). Thus I challenge the cost of adding the command wrt to result we get.
It doesn't handle the detection part but could fix it, yes
Why not. BTW it happens that we implemented a scheduler in Healthcheck to log statuses. It looks ok as a detection mechanism.
Not ok for detection but could be a good way to fix problems.
We agree
Definitely not my opinion. |
Then please express your opinion. I'm tired of gessing it. Again that don't mean it can't be done in a close future, that sprint content can't be handled, or that it can't fit in a future Sprint. So we implement A/ now. And plan work on B and C as we think it is needed. Would you agree with this? |
My opinion:
|
What about now? |
We do A. We create tickets for B & C. And will add it in following sprints. Expressing such concerns at grooming time would have been welcome. |
The linagora issue has a DOD, not the apache one. |
No description provided.